The Internet contains a vast amount of information, distributed over a multitude of computers connected by “The Net”, hence providing users with large amounts of information on any topic imaginable. Although large amounts of information are available, however, finding the desired information is not always easy or fast.
Search engines have been developed to address the problem of finding desired information on the Internet. Typically a user who has an idea of the type of information desired, enters a search term or search terms and a search engine returns a list of web pages that contain the term or terms. Alternately, a user may want to browse through data, as for example, when a user is not sure what information is wanted.
Not surprisingly, web-search is one of the premium applications on the Internet, resulting in substantial advertisement revenues. Results to Web-search queries are typically influenced by several metrics: 1) {C}—content relevance derived from documents' anchor text, title and headings, word frequency and proximity, file, directory, and domain names, and other more sophisticated forms of content analysis; 2) {U}—user behavior extrapolated from user's spent time-on-page, time-on-domain, click-through rates, etc.; 3) {P}—popularity in the global link structure with authority, readability, and novelty typically determining the linkage.
With current practices, links to the most “relevant,” according to the above criteria, pages are then potentially clustered and delivered to users who in turn browse the results to find the desired information. Although researched in detail along most of the mentioned criteria, search engines still leave a lot to be desired. With current practices there exists an important inefficiency of state-of-the-art search engines: content redundancy. Specifically, in queries where learning about a subject is objective, currently deployed search engines return unsatisfactory results as they consider the query coverage by each page individually, not a set of pages as a whole.
From the foregoing it is appreciated that there exists a need for systems and methods to ameliorate the shortcomings of existing practices.